Overview

Based on the locational uncertainty EDA, we have decided to use a dataset that merges the CORINE and land use datasets using the coarser CORINE dataset where the Vaud land use dataset is unavailable. The spatial detail of the OpFo dataset (created for the structured sampling design based on detailed maps from the Swiss government) resulted in a lot of uncertainty in sample categorization, even assuming best GPS recording practices by the collectors. The other two datasets were much less ambiguous, with 10m buffers largely falling into a single category. Thus, we will try to use the greater thematic accuracy of the land use dataset where possible, with the CORINE dataset as essentially the baseline.

This script first explores the union of the two datasets, where the detailed land use category is assigned if available, and the CORINE category if it is not. For analyses, the joint legend will be simplified to a smaller number of categories.

The union of CORINE and the Vaud land use layers was performed in QGIS 3.4.11.


Filter points

Using the method described in the coordinate uncertainty EDA, points with clearly unreliable coordinates are removed. We assume that the remaining coordinates were collected under good GPS conditions, with the location recorded where the ants were collected. We use a 5m buffer to account for the limitations of GPS coordinates collected by smartphones.

geo_exclude <- c("extrapolé mauvais", 
                 "extrapolé (base tube précédent)", 
                 "extrapolé (église par défaut)", 
                 "extrapolé (gare par défaut)", 
                 "extrapolé/corrigé (église par défaut)")
dec_thresh <- 4  # remove lat/lon with 0-3 decimals
pub_filt <- ant$pub %>% 
  filter(!is.na(GEOPRECISION)) %>%
  filter(!GEOPRECISION %in% geo_exclude) %>%
  filter(is.na(LATITUDE) | nchar(LATITUDE) >= (dec_thresh+3)) %>% 
  filter(is.na(LONGITUDE) | nchar(LONGITUDE) >= (dec_thresh+2)) 
pub.5m <- pub_filt %>% st_buffer(dist=5)

Land cover dataset

There are 153 categories in Vaud after overlaying the Vaud land use dataset with the CORINE dataset, prioritizing the Vaud land use dataset. First, we can just look at the distribution within Vaud for categories comprising at least 0.5% of the area.

Or scroll through all categories, sorted by percent coverage:


Intersect with points

The 5m-buffered points need to be intersected with the landcover dataset. The categorization is assigned as the most abundant category within the 5m buffer, following methods in the locational precision EDA.

lc_VD <- st_make_valid(lc_VD)
full_obs.df <- intersect_with_composition(pub.5m, lc_VD, "cat_full")
st_write(full_obs.df, paste0(gis_dir, "opfo_corine_use_21781.shp"))
full_obs.df <- st_read(paste0(gis_dir, "opfo_corine_use_21781.shp")) %>%
  categorize_buffers(., "cat_full", "full") %>%
  mutate(across(starts_with("cat_"), as.character),
         Dataset=if_else(grepl(":", cat_full), "CORINE", "VD_LU"),
         code_full=paste0(str_sub(Dataset, 1, 2), "_", str_sub(cat_full, 1, 3)))
## Reading layer `opfo_corine_use_21781' from data source `/Users/tsz/Documents/unil/opfo_main/2_gis/data/VD_21781/opfo_corine_use_21781.shp' using driver `ESRI Shapefile'
## Simple feature collection with 7921 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 496366 ymin: 116314 xmax: 585184 ymax: 202959
## CRS:            21781

In all of Vaud, there are 153 categories. Using the 5m buffers, there are 61 categories.

With the joint GIS layer, there is a bit more uncertainty compared to each dataset separately, but that is almost inevitable, since the large CORINE polygons are broken up by the smaller land use polygons, and the land use polygons now have CORINE polygons where before there was no data. Still, it is better than the OpFo dataset, with 85% of buffers capturing only a single land use / land cover type.

As hinted in the above plot, a large majority of collections (86.9%) fall into the CORINE polygons. Many of these are in urban (35.5%) or forested (25.3%) areas.

There are a few CORINE categories that are somewhat vague catch-alls. Looking at a satellite overlay in QGIS, the largest patches of 211: Non-irrigated arable land, 243: Land principally occupied by agriculture, with significant areas of natural vegetation, and 321: Natural grasslands are pastures, though not exclusively. In some areas, they include smaller patches of buildings or forest. This is inevitable with CORINE, which has a minimum mapping unit of 25ha. The documentation also only states a thematic accuracy of ≥85%, without additional detail.


Output full categories

Output categories to upload to google drive, where they will be aggregated.

lc_VD %>% st_set_geometry(NULL) %>%
  group_by(Dataset, cat_full) %>% summarise(tot_area=as.numeric(sum(area))) %>%
  ungroup %>% 
  mutate(prop_area=tot_area/sum(tot_area)) %>%
  mutate(in_collections=cat_full %in% full_obs_props$cat_full,
         prop_collections=full_obs_props$prop_area[match(cat_full,full_obs_props$cat_full)],
         code_full=paste0(str_sub(Dataset, 1, 2), "_", 
                          str_sub(cat_full, 1, 3))) %>%
  arrange(desc(in_collections), Dataset, cat_full) %>%
  write_csv("temp/corine_open_category_lookup.csv")

Ambiguous categories

For several categories, it is not obvious (to me) how they should be re-categorized. These are shown here, with examples of satellite imagery.

141: Green urban areas

Urban?
141 141_a 141_b 141_c

211: Non-irigated arable land

Pasture?
211 211_a 211_b 211_c

242: Complex cultivation patterns

Urban?
242 242_a 242_b 242_c

243: Land principally occupied by agriculture, with significant areas of natural vegetation

Pasture?
243 243_a 243_b 243_c

322: Moors and heathland

Transition? Pasture?
322 322_a 322_b 322_c

324: Transitional woodland-shrub

Transition? Pasture?
324 324_a 324_b 324_c

411: Inland marshes

Pasture? Wetland?
411 411_a 411_b 411_c

545 Cult. mar. plein champ

Agricultural tilled? non-tilled?
545 545_a 545_b 545_c

717 Surf. viti. biodiv.

Urban? Transition? Agriculture?
717 717_a 717_b 717_c

720 Châtaigneraies

Forest?
720 720_a 720_b 720_c

801 C mar s abri avec fond.

Agriculture tilled? But maybe in a greenhouse…?
801 801_a

851 Surfaces a litière

Pasture? 851 851_a

902 Surfaces improductives

Urban?
902 902_a 902_b 902_c

907 Chem nat non stabilisés

Pasture?
907 907_a 907_b

998 Autres surfaces hors sau

Urban? Agriculture non-tilled? Prairie?
998 998_a